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LONG-TERM  GOALS 

The  long  term  objective  of  our  research  for  the  “High  Resolution  Air-Sea  Interaction”  (HRES) 
Departmental  Research  Initiative  (DRI)  is  to  identify  the  couplings  between  large  wave  events, 
winds,  and  currents  in  the  surface  layer  of  the  marine  boundary  layers.  Turbulence  resolving 
large  eddy  simulations  (LESs)  and  direct  numerical  simulations  (DNSs)  of  the  marine  atmo¬ 
spheric  boundary  layer  (MABL)  in  the  presence  of  time  and  space  varying  wave  fields  will  be 
the  main  tools  used  to  elucidate  wind- wave-current  interactions.  A  suite  of  turbulence  simula¬ 
tions  over  realistic  seas  using  idealized  and  observed  pressure  gradients  will  be  carried  out  to 
compliment  the  field  observations  collected  in  moderate  to  high  winds.  The  database  of  simu¬ 
lations  will  be  used  to  generate  statistical  moments,  interrogated  for  coherent  structures,  and 
ultimately  used  to  compare  with  IIRES  observations. 


OBJECTIVES 

Our  near  term  goals  are:  1)  participate  in  the  planning  process  for  the  HRES  field  campaign; 
and  2)  construct  an  LES  code  applicable  to  the  HRES  high  wind  regime.  In  order  to  accomplish 
the  latter  goal  we  are  improving  the  parallelization  of  our  base  LES  code  and  developing  an  al¬ 
gorithm  to  allow  simulations  of  turbulent  winds  over  nearly  arbitrary  3-D  wave  fields. 


APPROACH 


We  plan  on  investigating  interactions  between  the  MABL  and  the  connecting  air-sea  interface 
using  both  LES  and  DNS.  The  waves  will  be  externally  imposed:  (1)  based  on  well  established 
empirical  wave  spectra;  or  (2)  ultimately  provided  by  direct  observations  of  the  sea  surface  from 
field  campaigns.  The  main  technical  advance  is  the  development  of  a  computational  tool  that 
allows  for  nearly  arbitrary  3-D  wave  fields,  z.e.,  the  sea  surface  elevation  7]  =  r](z,y,t)  as  a  sur¬ 
face  boundary  condition.  The  computational  method  will  allow  time  and  space  varying  surface 
conditions  over  a  range  of  wave  scales  0(lO)m  or  larger. 
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WORK  COMPLETED 


During  the  past  year  we  attended  a  PI  meeting  in  Irvine,  CA  to  outline  the  HRES  field  cam¬ 
paign.  The  discussion  foeused  on  the  atmospheric  measurements  to  be  collected  from  aircraft 
and  from  R/V  FLIP  with  particular  attention  paid  to  the  surface  layer  pressure  measurements. 
The  field  campaign  strategy  will  be  further  refined  at  a  meeting  in  Monterey,  CA  in  early  De¬ 
cember.  A  summary  of  the  discussion  is  available  from  C.  Friehe  (U.C.  Irvine). 

In  an  effort  to  link  modeling  and  observations,  high-resolution  turbulence  calculations  will  be 
carried  out  as  part  of  the  HRES  program.  In  anticipation  of  these  computationally-intensive 
simulations  and  the  need  for  a  more  general  surface  boundary  capability  we  undertook  several 
code  enhancements.  First,  we  built  and  tested  a  now  highly  parallel  algorithm  for  our  “flat'5 
LES  code,  i.e.,  the  baseline  LES  code  with  a  flat  lower  boundary.  This  allows  us  to  design  and 
implement  the  code  parallelization  without  the  additional  complexities  of  a  moving  lower  bound¬ 
ary.  The  resulting  code  will  be  further  modified  for  HRES  applications. 

A  detailed  description  of  the  parallel  algorithm  and  an  evaluation  of  the  code  performance  for 
varying  problem  sizes  and  machine  architectures  is  provided  in  Sullivan  &  Patton  (2008).  The 
design  criteria  for  the  new  parallel  algorithm  are:  1)  accomplish  2-D  domain  decomposition  us¬ 
ing  solely  MPI  (Message  Passing  Interface)  parallelization;  and  2)  preserve  our  well  established 
numerical  algorithm  based  on  pseudospectral  (FFT)  spatial  differencing  and  Runge-Kutta  time 
stepping  (Sullivan  et  al ,  1996).  The  ability  to  use  2-D  domain  decomposition,  i.e.,  splitting  the 
computational  work  across  two  computational  directions,  has  significant  advantages  in  pseu¬ 
dospectral  simulation  codes  as  it  allows  direct  numerical  simulations  of  isotropic  turbulence 
on  large  meshes,  20483  or  more  (e.g.,  Pekurovsky  et  a/.,  2006).  In  our  2-D  decomposition  each 
processor  operates  on  three-dimensional  “bricks”  sub-sampled  in  x,  y,  or  z  directions.  Brick- 
to-brick  communication  is  a  combination  of  transposes  and  ghost  point  exchange.  To  preserve 
pseudospectral  differencing  in  the  horizontal  directions  a  custom  MPI  matrix  transpose  was  de¬ 
signed  and  implemented.  Finally,  a  new  pressure  Poisson  solver  was  coded  to  match  the  domain 
decomposition  (further  details  are  given  in  Sullivan  &  Patton,  (2008)). 

With  these  enhancements  our  new  algorithm  allows  very  large  number  of  processors  0(  104)  to 
be  utilized.  In  our  previous  code  the  total  number  of  MPI  tasks  is  limited  by  the  number  of 
vertical  gridpoints  typically  O(102).  This  increased  flexibility  allows  simulations  in  anisotropic 
computational  domains,  for  example  boxes  with  large  horizontal  and  small  vertical  extents.  The 
transpose  and  ghost-point  exchange  routines  are  general  and  allow  arbitrary  numbers  of  mesh 
points,  although  the  best  performance  is  realized  when  the  load  is  balanced  across  processors. 
Single  files,  similar  to  FORTRAN  direct  access  files,  are  written  and  read  using  MPI  I/O.  We 
find  MPI  I/O  makes  the  code  robust  across  different  machine  architectures  and  simplifies  the 
logic  required  for  restarts,  especially  if  the  number  of  processors  changes  during  the  course  of  a 
simulation.  Finally,  the  code  is  compliant  with  the  FORTRAN-90  programming  standard. 

An  example  of  the  code  scaling  for  varying  workload  as  a  function  of  the  total  number  of  pro¬ 
cessors  NP  is  provided  in  figures  1  and  2  for  3  different  machine  architectures.  NP  =  N Pz  X 
N Pxy  where  NPZ  and  NPxy  are  the  number  of  processors  in  the  vertical  and  horizontal  direc¬ 
tions,  respectively.  In  each  figure,  the  vertical  axis  is  total  computational  time  t  x  N P  divided 
by  total  work.  Nz  is  the  number  of  vertical  levels  and  Mx  .y  is  proportional  to  the  FFT  work  in 
the  x  and  y  directions.  Ideal  scaling  corresponds  to  a  flat  line  with  increasing  number  of  proces¬ 
sors.  The  timing  tests  illustrate  the  present  scheme  exhibits  both  strong  scaling  (problem  size  is 
held  fixed  and  the  number  of  processors  is  increased)  and  weak  scaling  (the  problem  size  grows 
as  the  number  of  processors  increases  so  the  amount  of  work  per  processor  is  held  constant)  over 
a  wide  range  of  problem  sizes  and  is  able  to  use  as  many  as  16,384  processors,  i.e.,  the  maxi- 


mum  number  available  to  our  application  on  a  Cray  XT41.  Further,  the  results  are  robust  for 
varying  combinations  of  ( NPZ .  NPxy).  Generally,  the  performance  only  begins  to  degrade  when 
the  number  of  processors  exceeds  about  8  times  the  smallest  dimension  in  the  problem  owing  to 
increases  in  communication  overhead. 


RESULTS 

Parallel  codes  allow  one  to  simulate  PBLs  on  fine  meshes  which  permits  a  wider  range  of  scale 
interactions  and  thereby  lessens  the  impact  of  subgrid-scale  parameterizations.  In  Sullivan 
Patton  (2008)  we  examine  the  sensitivity  and  convergence  of  LES  solutions  as  the  grid  mesh  is 
varied  from  323  to  10243  for  the  classical  problem  of  daytime  dry  PBL  convection  (Nieuwstadt 
ct  al. (1093)  describes  this  flow  regime).  The  grid  sensitivity  study  shows  that  the  LES  solutions 
converge  reasonably  well  onlv  for  meshes  greater  than  2563,  i.e.,  when  the  ratio  of  the  bound¬ 
ary  layer  height  to  the  vertical  mesh  spacing  zxj  Az  >  130.  We  find  the  skewness  of  vertical 
velocity  Su  highlights  the  solution  sensitivity  to  grid  resolution.  The  variation  of  Sw  with  mesh 
resolution  is  a  consequence  of  a  Smagorinsky  closure  which  neglects  third-order  SGS  moments; 
subgrid-scale  skewness  has  a  pronounced  local  maximum  in  the  surface  layer  and  near  the  PBL 
inversion.  Flow  visualization  of  the  fine  mesh  5123  and  10243  solutions  also  illustrates  the  im¬ 
pact  of  fine  mesh  resolution  on  coherent  structures.  In  figures  3  and  4,  we  observe  a  a  number  of 
intriguing  structural  features,  viz.,  large  scale  plumes  coupled  to  small  scale  vortical  dust  devils 
which  only  emerge  with  fine  mesh  resolution.  The  highly  parallel  code  described  above  will  be¬ 
come  the  baseline  code  for  the  time  evolving  wavy  boundary  computations  in  IIRES.  We  expect 
that  the  small  scale  vortical  structures  and  statistics  found  in  the  canonical  convective  PBL, 
as  well  as  neutral  flow  statistics  and  structures,  will  be  modulated  by  high  winds  and  waves  in 
the  marine  surface  layer.  Finally,  we  also  note  that  our  parallel  code  is  immediately  useful  for 
simulations  of  the  ocean  mixed  layer  in  the  OXR  initiative  Impact  of  Typhoons  in  the  Western 
Pacific  Ocean. 


IMPACT  /APPLICATIONS 


The  computational  tools  developed  and  the  database  of  numerical  solutions  generated  will  aide 
in  the  interpretation  of  the  observations  gathered  during  the  HRES  field  campaigns.  In  addition 
idealized  process  studies  performed  with  the  simulations  have  the  potential  to  improve  parame¬ 
terization  of  surface  drag  under  high  wind  conditions  in  large  scale  models. 


TRANSITIONS  &  RELATED  PROJECTS 


We  are  currently  engaged  in  analyzing  data  collected  during  the  Ocean  Horizontal  Array  Tur¬ 
bulence  Study  (OHATS)  and  the  Coupled  Boundary  Layers  Air-Sea  Transfer  (CBLAST)  field 
campaigns.  These  are  joint  efforts  between  NCAR,  and  numerous  university  investigators.  Also 
the  present  work  has  links  to  the  ONR  DRI  on  the  impact  of  typhoons  in  the  Western  Pacific 
Ocean  (ITOP). 

Computations  on  the  Cray  XT4  Franklin”  were  carried  out  by  Dr.  E.  Patton  (NCAR)  with  computer  time 
provided  by  the  National  Energy  Research  Scientific  Computing  Center  which  is  managed  by  the  U.S.  Depart¬ 
ment  of  Energy. 
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NP  Processors 

Figure  1:  Computational  time  per  gridpoint  for  different  combinations  of  problem  size  and  2-D 
domain  decomposition  for  the  Cray  XT4  (an  example  of  strong  scaling),  a)  green  lines  and  sym¬ 
bols  problem  size  5123;  b)  red  lines  and  symbols  10243;  c)  black  lines  and  symbols  20483;  and  d) 
blue  symbol  30723.  For  a  given  number  of  total  processors  N  P  the  symbols  are  varying  vertical 
and  horizontal  decompositions,  i.e.,  different  combinations  (NPZ,  NPxy). 


Figure  2:  Computational  time  per  gridpoint  for  a  fixed  amount  of  work  per  processor  (an 
example  of  weak  scaling).  Red,  green,  and  blue  lines  60,000  points/processor  for  different  ma¬ 
chines.  Cray  XT4  red  line;  Dual  core  IBM  SP5+  green  line;  Single  core  IBM  SP5  blue  line. 
Black  lines  and  symbols  524,288  points/processor  for  Cray  XT4.  For  a  fixed  number  of  total 
processors  NP  multiple  symbols  are  different  combinations  of  (NPZ,  NPxy). 
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Figure  3:  Visualization  of  the  verti¬ 
cal  velocity  field  in  a  convective  PBL 
at  different  heights  from  a  5123  simu¬ 
lation  Plumes  near  the  inversion  can 
trace  their  origin  to  the  hexagon  pat¬ 
terns  in  the  surface  layer.  The  color 
bar  is  in  units  of  ms-1. 


Figure  4:  Visualization  of  particles  released  in  a  convective  PBL  at  zj zt  ~  0.2  over  a  limited 

horizontal  extent  from  a  10243  simulation  of  convection.  The  viewed  area  is  ~  3.8%  of  the  total 
horizontal  domain.  Notice  the  evolution  of  the  larger  scale  line  of  convection  into  small  scale 
vortical  dust  devils.  Time  advances  from  left  to  right  beginning  along  the  top  row  of  images. 


